Building Digital Libraries from Simple Building Blocks Authors
نویسندگان
چکیده
Metadata harvesting has been established by the Open Archives Initiative (OAI) as a viable mechanism for connecting a provider of data to a purveyor of services. The Open Digital Library (ODL) model is an emerging framework which attempts to break up the services into appropriate components based also on the basic philosophy of the OAI model. This framework has been applied to various projects and evaluated for its simplicity, extensibility and reusability to support the hypothesis that digital libraries (DLs) should be built from simple Web Service-like components instead of as monolithic software applications. Introduction Work in the field of digital libraries was launched in the early 1990s (Fox, 1993). Since then there has been a rapid expansion of research and development, integrating work from related fields and involving hundreds of projects (Fox and Sornil, 1999; Fox and Urs, 2002). Yet, it still is rather difficult to build a digital library. This fact suggests that a new approach, building upon earlier research, and integrating key concepts from modern software engineering practice, is needed. Accordingly we provide background regarding related work on the Open Archives Initiative (OAI), introduce the Open Digital Library (ODL) model and framework, illustrate ODL’s applicability by considering a number of case studies, discuss its evaluation with respect to performance and ease of use, and outline future directions. The Open Archives Initiative Background and Motivation The Open Archives Initiative (OAI) was launched in response to a recognised need for low-cost interoperability solutions in the digital library community (Van de Sompel and Lagoze, 2000; Suleman and Fox, 2002a). Besides connecting together systems in distributed digital libraries, the OAI addressed problems arising from collocation of data and services. Acknowledging that the owners of high quality data were not always the best candidates to provide high quality services, the OAI encouraged a multi-layered approach to system development with the data collection distinctly separated from the services provided. To then make the connection between data and services, the OAI developed a Web-based network protocol for simple data transfer – the Protocol for Metadata Harvesting (PMH) (Lagoze, et al., 2002). By just transferring metadata instead of supporting remote searching, this protocol takes the burden off data providers and places it on service providers – making it easier for those who collect to also share their data. The OAI-PMH enables remote access to collections of metadata, thus enabling the development of interesting services such as the Torii portal (Bertocco, 2001) for resource discovery and annotation, and the Open Citation Project (Hitchcock, et al., 2002) for reference linking of resources. The OAI-PMH is used by many existing popular DL archives to expose their previously opaque collections. In addition, new distributed DL projects – such as the revitalised NCSTRL (Anan, et al., 2002) – use a system model that is based on harvesting metadata from multiple remote sites into one or more central user portals. Protocol for Metadata Harvesting The OAI-PMH is a client-server protocol layered over HTTP, using CGI-encoded parameters in requests and XML-encoded data in responses. The aim of the protocol is to support the batch transfer of metadata from a server (data provider) to a client (service provider) using incremental updates whenever a transfer is initiated. This process of obtaining all the (new) metadata from a server, instead of only that which satisfies a search query, is commonly known as harvesting. The OAI-PMH is made up of 6 requests and associated responses, 3 of which are administrative while the other 3 are for data transfer. These requests, and the semantics of their responses, are as follows: • Identify – general information about the archive, administrator and policies. • ListMetadataFormats – a list of all the metadata formats supported by the archive as well their XML namespaces and schema locations. • ListSets – a list of all the subsections of the archive for selective harvesting. • ListIdentifiers – a list of identifiers for all records, corresponding to the required metadata format parameter and optional date range and/or set parameters. • GetRecord – a single record, specified by its unique identifier and metadata format. • ListRecords – a list of records in the specified metadata format, corresponding to optional date range and/or set parameters. Open Digital Libraries Concept In developing the metadata harvesting protocol, the OAI provided a mechanism to separate data providers from service providers. As part of this process, the OAI established best practices to support their protocol, but which are potentially relevant to digital library design in general. Included among these best practices are the enforcement of identifier uniqueness and the ability to obtain a single record from a source repository based solely on its identifier, metadata format and a network address for the source repository. These are fundamental ideas that were part of Kahn and Wilensky’s Repository Access Protocol (Kahn and Wilensky, 1995) and which have now been realised in OAI’s broadly-supported DL interoperability protocol. The Open Digital Library (ODL) project (Suleman and Fox, 2001) has exploited this conceptual framework provided by the OAI protocol in order to form the base for a general-purpose inter-component interaction protocol for digital libraries. Digital libraries have reached a stage in development where they can be specified in terms of standard suites of services. Discussion on architecture and models has consistently noted the need for flexible component models (Gladney, et al., 1994; DELOS, 2001). The OAI protocol fulfils some of this need by providing the mechanism by which miniand dumb archives can be set up. It is no longer necessary for such archives to provide userdirected services – these services can be delegated to appropriate service providers. Thus, the data providers form the basic components of a digital library, with the sole requirement being to export data. The layers built upon this fall within the ambit of the ODL project. ODL defines popular services as self-contained components and defines interfaces for these components to interact with upstream data providers and peer components, as well as downstream components and elements of user interfaces. As upstream archives support the OAI-PMH, which already contains many desirable elements of digital library design, it was decided to model the inter-component interaction protocols as extensions of the OAI-PMH. Then each component is an extended Open Archive, and the digital library is made up of a network of extended Open Archives, denoted as ODL service components in Figure 1.
منابع مشابه
System Support for Name Authority Control Problem in Digital Libraries: OpenDBLP Approach
In maintaining Digital Libraries, having bibliographic data up-to-date is critical, yet often minor irregularities may cause information isolation. Unlike documents for which various kinds of unique ID systems exist (e.g., DOI, ISBN), other bibliographic entities such as author and publication venue do not have unique IDs. Therefore, in current Digital Libraries, tracking such bibliographic ent...
متن کاملUsing ontology for building distributed digital libraries with multimedia contents
This paper presents a new approach to build distributed digital libraries with multimedia contents. The authors propose a new scheme for media feature based concept modelling to address the limitation of traditional ontology based multimedia retrieval systems. The perceptual models can be used for semantic query processing using standard MPEG-7 media content descriptions. The authors have defin...
متن کاملInfosphere Project: An Overview
We describe the Infosphere project, which is building the systems software support for information-driven applications such as digital libraries and electronic commerce. The main technical contribution is the Infopipe abstraction to support information flow with quality of service. Using building blocks such as program specialization, software feedback, domain-specific languages, and personaliz...
متن کاملDynamic combinatorial chemistry with hydrazones: libraries incorporating heterocyclic and steroidal motifs.
We expand the possibilities in hydrazone based dynamic combinatorial chemistry with a series of new building blocks incorporating heterocyclic motifs. The synthetic procedure allows efficient access to building blocks with the general structure (MeO)(2)CH-Heterocycle-C(O)NHNH(2), originating from heterocycles with an amine and an ester functionality. The equilibrium distribution of macrocyclic ...
متن کاملRapidly measuring reactivities of carboxylic acids to generate equireactive building block mixtures: a spectrometric assay.
The relative reactivity of building blocks is critical for a successful preparation of combinatorial libraries. Here, we present a method for measuring the reactivity of carboxylic acid building blocks in amide-forming reactions. The method involves competitive reactions between a reference and test acid and a tetraphenylporphyrin reaction partner with four reactive sites. Relative reactivities...
متن کاملPrinciples of nanostructure design with protein building blocks.
Currently there is increasing interest in nanostructures and their design. Nanostructure design involves the ability to predictably manipulate the properties of the self-assembly of autonomous units. Autonomous units have preferred conformational states. The units can be synthetic material science-based or derived from functional biological macromolecules. Autonomous biological building blocks ...
متن کامل